AITopics | learning deep homogeneous model

Collaborating Authors

learning deep homogeneous model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

Neural Information Processing SystemsNov-20-2025, 23:17:42 GMT

algorithmic regularization, gradient descent, learning deep homogeneous model, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

Neural Information Processing SystemsOct-8-2024, 20:25:02 GMT

We study the implicit regularization imposed by gradient descent for learning multi-layer homogeneous functions including feed-forward fully connected and convolutional deep neural networks with linear, ReLU or Leaky ReLU activation. We rigorously prove that gradient flow (i.e. This result implies that if the weights are initially small, gradient flow automatically balances the magnitudes of all layers. Using a discretization argument, we analyze gradient descent with positive step size for the non-convex low-rank asymmetric matrix factorization problem without any regularization. Inspired by our findings for gradient flow, we prove that gradient descent with step sizes \eta_t O(t { (1/2 \delta)}) (0 \delta\le1/2) automatically balances two low-rank factors and converges to a bounded global optimum.

algorithmic regularization, gradient descent, learning deep homogeneous model, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Add feedback

Reviews: Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

Neural Information Processing SystemsOct-8-2024, 10:57:26 GMT

Edit after author feedback, I do not wish to change my score: - To me balanced positive quantities is not only about their difference. They should have similar order of magnitude, the difference between 1e-13 and 1 is pretty small but they are clearly unbalanced. The same goes for "Theta" and "poly" notation. None of the statements of the paper involving these notations has this feature: the variables epsilon, d, d1 and d2 are fixed and are never quantified with a "forall" quantifier. The fact that a notation is standard does not mean that it cannot be misused. The authors consider deep learning models with a specific class of activation functions which ensures that the model remains homogeneous: multiplying the weight of a layer by a positive scalar and dividing the weights of another layer by the same amount does not change the prediction of the network.

matrix factorization, notation, smoothness, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.56)

Add feedback

Algorithmic Regularization in Learning Deep Homogeneous Models: Layers are Automatically Balanced

Du, Simon S., Hu, Wei, Lee, Jason D.

Neural Information Processing SystemsFeb-15-2020, 19:26:27 GMT

We study the implicit regularization imposed by gradient descent for learning multi-layer homogeneous functions including feed-forward fully connected and convolutional deep neural networks with linear, ReLU or Leaky ReLU activation. We rigorously prove that gradient flow (i.e. This result implies that if the weights are initially small, gradient flow automatically balances the magnitudes of all layers. Using a discretization argument, we analyze gradient descent with positive step size for the non-convex low-rank asymmetric matrix factorization problem without any regularization. Inspired by our findings for gradient flow, we prove that gradient descent with step sizes $\eta_t O(t { (1/2 \delta)}) (0 \delta\le1/2)$ automatically balances two low-rank factors and converges to a bounded global optimum.

algorithmic regularization, gradient descent, learning deep homogeneous model, (3 more...)

Neural Information Processing Systems

Genre: Research Report (0.42)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Add feedback